Mixture of Gaussians for distance estimation with missing data

نویسندگان

  • Emil Eirola
  • Amaury Lendasse
  • Vincent Vandewalle
  • Christophe Biernacki
چکیده

Many data sets have missing values in practical application contexts, but the majority of commonly studied machine learning methods cannot be applied directly when there are incomplete samples. However, most such methods only depend on the relative differences between samples instead of their particular values, and thus one useful approach is to directly estimate the pairwise distances between all samples in the data set. This is accomplished by fitting a Gaussian mixture model to the data, and using it to derive estimates for the distances. A variant of the model for high-dimensional data with missing values is also studied. Experimental simulations confirm that the proposed method provides accurate estimates compared to alternative methods for estimating distances. In particular, using the mixture model for estimating distances is on average more accurate than using the same model to impute any missing values and then calculating distances. The experimental evaluation additionally shows that more accurately estimating distances leads to improved prediction performance for classification and regression tasks when used as inputs for a neural network.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Handling Missing Data with Variational Bayesian Learning of ICA

Missing data is common in real-world datasets and is a problem for many estimation techniques. We have developed a variational Bayesian method to perform Independent Component Analysis (ICA) on high-dimensional data containing missing entries. Missing data are handled naturally in the Bayesian framework by integrating the generative density model. Modeling the distributions of the independent s...

متن کامل

Variational Bayesian Learning of ICA with Missing Data

Missing data are common in real-world data sets and are a problem for many estimation techniques. We have developed a variational Bayesian method to perform independent component analysis (ICA) on highdimensional data containing missing entries. Missing data are handled naturally in the Bayesian framework by integrating the generative density model. Modeling the distributions of the independent...

متن کامل

­­Image Segmentation using Gaussian Mixture Model

Abstract: Stochastic models such as mixture models, graphical models, Markov random fields and hidden Markov models have key role in probabilistic data analysis. In this paper, we used Gaussian mixture model to the pixels of an image. The parameters of the model were estimated by EM-algorithm.   In addition pixel labeling corresponded to each pixel of true image was made by Bayes rule. In fact,...

متن کامل

IMAGE SEGMENTATION USING GAUSSIAN MIXTURE MODEL

  Stochastic models such as mixture models, graphical models, Markov random fields and hidden Markov models have key role in probabilistic data analysis. In this paper, we have learned Gaussian mixture model to the pixels of an image. The parameters of the model have estimated by EM-algorithm.   In addition pixel labeling corresponded to each pixel of true image is made by Bayes rule. In fact, ...

متن کامل

Parameter Estimation in Gaussian Mixture Models with Malicious Noise, without Balanced Mixing Coefficients

We consider the problem of estimating the means of two Gaussians in a 2-Gaussian mixture, which is not balanced and is corrupted by noise of an arbitrary distribution. We present a robust algorithm to estimate the parameters, together with upper bounds on the numbers of samples required for the estimate to be correct, where the bounds are parametrised by the dimension, ratio of the mixing coeff...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Neurocomputing

دوره 131  شماره 

صفحات  -

تاریخ انتشار 2014